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Abstract 

We present a fast and transparent multi-variate event classification technique, called 
PDE-RS, which is based on sampling the signal and background densities in a multi¬ 
dimensional phase space using range-searching. The employed algorithm is presented 
in detail and its behaviour is studied with simple toy examples representing basic pat¬ 
terns of problems often encountered in High Energy Physics data analyses. In addition 
an example relevant for the search for instanton-induced processes in deep-inelastic 
scattering at HERA is discussed. For all studied examples, the new presented method 
performs as good as artificial Neural Networks and has furthermore the advantage to 
need less computation time. This allows to carefully select the best combination of 
observables which optimally separate the signal and background and for which the sim¬ 
ulations describe the data best. Moreover, the systematic and statistical uncertainties 
can be easily evaluated. The method is therefore a powerful tool to find a small number 
of signal events in the large data samples expected at future particle colliders. 


Keywords: probability density estimation, multi-variate discrimination technique, range¬ 
searching, event classification, Neural Networks, instanton-induced processes, deep- 
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1 Introduction 

In High Energy Physics one is frequently confronted with the task of finding a 
small number of distinctive (signal) events among a large number of background 
events. This problem is often tackled by simply applying cuts, motivated by 
models describing these characteristic events, on measured characteristic observ¬ 
ables. However, in particular in complex cases where many observables have to 
be used, more powerful techniques exist, which are based on probability density 
estimation (PDE) or which employ Neural Networks (NNs). They first combine 
the observables to a single one, called “discriminant” on which then a cut to sep¬ 
arate signal from background is applied. These methods were widely used in the 
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search for the top-quark at the TEVATRON 0 0,0 and will play a major role 
in the search for the Higgs boson in the future Q|. For a general introduction to 
multi-variate discrimination techniques see j5j. 

A drawback of these methods is often that they come as a “black box” which 
provides few insights on the statistical and systematic uncertainties of the results 
obtained with the method. We will present here a novel discrimination technique 
based on counting signal and background events in small multi-dimensional boxes. 
The simple event counting allows to transparently handle the involved uncertain¬ 
ties and to separate signal and background events with a power similar to NNs. 
The event counting is done using a fast range-searching algorithm. Its speed al¬ 
lows to use very large data samples required in analyses where a high reduction 
of background is necessary or to scan a large number of observables for those 
which give the best separation of signal and background and for which the Monte 
Carlo simulations describe the data best. This technique has already been used 
in recent searches at HERA sun- 

2 Probability Density Estimation Techniques 

In order to classify an event it is necessary to estimate the probability p(x) that 
an event is of the signal class, given d measured properties (called observables 
in the following) x = (aq,... , aq). An estimate p(x) of p(x) can be obtained by 
employing Monte Carlo simulators which approximate the probability densities of 
the signal p s (x) rs p s (x) and of the background events p&(x) ps p 6 (x) by sampling 
the d -dimensional phase space with simulated events. The probability that a 
particular event belongs to the signal class is then given by 

xl = _Ah_ K „( X ) = _Ah_ 

; p.M + aM P{ ’ p,(h + Mh' 

The function p(x) is a so called “Discriminant”, since it assigns to any given com¬ 
bination of measured observables a single value which discriminates background 
from signal events. 

Finding good approximations of the signal and background densities can be a 
rather difficult problem, especially in high dimensional cases. Here, histograming 
methods cannot be used because the number of required bins increases as m d , 
if m is the number of bins per dimension. This causes a dramatic increase in 
memory usage and a decrease of the available number of events per bin. The 
problem is aggravated by correlations among the observables which is often the 
case in High Energy Physics applications. Due to correlations, the phase space 
is commonly populated only in a sub-space of lower dimensionality, i.e. the 
intrinsic dimensionality of the problem actually is smaller. To overcome the 
problem of the high dimensionality, sometimes methods are employed, which 
try to deduce the multi-dimensional probability density from projections. These 
methods suffer from correlations among the observables, which are not modelled 
by the projections. 
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Kernel based PDE methods [2;] sum up appropriately chosen kernel functions 
to model the probability density around the point x: 




where the sum runs over all sample events x, ; , N is the total number of events 
in the data sample and h is a smoothing parameter. For the kernel function K 
often a Gaussian distribution is chosen: 



( 3 ) 


making the resulting distribution continuous and differentiable. Since for every 
event which is classified, eq. @ needs to be evaluated involving the sum over all 
N sample events, these methods are very time consuming for large data samples. 
To avoid this problem functions only defined in a small region around x have to 
be used. 

A powerful method to estimate p(x) are artificial Neural Networks [Q. Their 
design is inspired by biological neurons. In a training phase they parameterise the 
probability density by linear combinations of smooth functions. Using training 
events of known type the free parameters, usually called weights, are adjusted. 
The convergence of this procedure is usually fast reducing the time requirements 
compared to kernel based PDE methods. Given sufficiently large NNs with a 
high number of nodes, even very complicated probability densities can be ap¬ 
proximated. Their good performance and their fast applicability make them a 
good candidate to compare the new method with. In many problems a fast com¬ 
puting time is crucial to perform a meaningful reduction of input observables by 
studying combinations of them and is important to handle large data samples 
required in searches where the background needs to be strongly reduced. 

3 Probability Density Estimation based on Range-Searching 
3.1 The PDE-RS Method 

The multi-variate probability density estimation technique based on range-searching 
(PDE-RS) counts the number of Monte Carlo generated signal and background 
events in the vicinity of an event which is to be classified. From the counted 
events the probability of this event to be of the signal class is derived. This is 
done in an efficient way using range-searching with an algorithm described below. 
Given the number of signal events n s and the number of background events in 
a small volume V (x) around the point x, we define a discriminant 


which for sufficiently small volumes V (x) and a sufficiently high number of sample 
events gives a very good approximation of p(x), if the normalisation constant c is 
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chosen such that the total number of simulated signal events is equal to c times 
the total number of background events. D(x) provides a good estimate of the 
local event density and prevents a wrong classification due to bad interpolation 
into regions where the Monte Carlo simulations provide no information. This 
happens, if methods try to fit the event density globally, as is done e.g. by 
Neural Networks. However, in the PDE-RS method a large number of Monte 
Carlo generated events is needed to densely populate the whole phase space. This 
can be a limiting factor, if the number of observables and thus the dimensionality 
of the problem is high. For the counting method, in the vicinity of each event 
to be classified a large number of events have to be counted — a potentially 
time-consuming task. This problem is known as “range-searching” in computer 
sciences. 

Range-searching has been studied intensively since several years, because the 
problem to find a specific event in a large data sample occurs in all sorts of 
classification tasks. Powerful algorithms have been devised to tackle it @,0. 
Two different classes of algorithms are usually applied: One which subdivides 
the entire volume of the observable space into small boxes and stores the events 
within the boxes in a linked listf]. Searching for an event then just involves looking 
up which boxes are in the vicinity of the event that is to be classified and then 
simply scan the linked list for events within a certain distance. The second class of 
algorithms use multi-dimensional binary trees to store the events. An algorithm 
of this class as described in [|T0| is used here. The advantage is that in contrast 


to the subdivision algorithm the extent of the observable space needs not to be 
known, that is the minima and maxima of the observables need not be calculated 
before. In addition, subdivision algorithms have a huge memory consumption if 
the dimensionality of the problem is large, since they have to store the pointers 
to the linked lists in an array of the dimension of the problem, even if no event 
lies in a box. Such a behaviour is, however, expected for High Energy Physics 
events, for which the observables describing their properties are in many cases 
correlated leaving a large fraction of the phase space empty. 


3.2 The PDE-RS Algorithm 


The algorithm used for the event classification is based on the range-searching 
algorithm described in [|T0[p|. It allows to search through N Monte Carlo generated 
events that sample the signal and background density within a time rv./ log 2 (iV). 
To achieve this scaling of the algorithm with the total number of events, all N 
events are first stored in two d-dimensional binary trees — one for the background 
and one for the signal events — as is sketched in figure |l] for a two-dimensional 
example: Consider a random sequence of signal events ei(x i, £ 2 ), i — 1... 7 shown 
in figure [Tla with their position in x\ — a^-space, which are to be stored in a binary 
tree. The first event in the sequence becomes by definition the topmost node of 


1 A data structure with a data element (the event) and a pointer to the next element. 

2 There, also some program code may be found. The full C-l —V implementation as it is used 
here is available from the authors (carli@mail.desy.de, koblitz@mail.desy.de). 
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Fig. 1: a) Signal events distributed in a 2-dimensional phase space spanned by 
the observables X\ and rc 2 . The numbers refer to the occurrence of the 
events in the data sample, b) the resulting binary tree to store the events 
and c) signal and background events in the X 1 -X 2 plane. The open triangle 
depicts the event to be classified. The box around the triangle shows the 
region where events are considered to calculate the discriminant. The lines 
and numbers illustrate the way how the events in the box are found with 
the help of the binary tree. A detailed description can be found in the 
text. 


the tree. The second event e 2 (:ri,;r 2 ) has a larger a^-coordinate than the first 
event, therefore a new node is created for it and the node is attached to the first 
node as the right child (if the ^-coordinate had been smaller, the node would 
have become the left child). Event e 3 has a larger aq-coordinate than event ei, 
it therefore should be attached to the right branch below e\. Since e 2 is already 
placed at that position, now the ^-coordinates of e 2 and e 3 are compared, and, 
since e 3 has a larger x 2 , e 3 becomes the right child of the node with event e 2 . 
Thus the tree is sequentially filled by taking every event and, while descending 
the tree, comparing its X\ and x 2 coordinates with the events already in place. 
Whether x\ or x 2 are used to compare depends on the level within the tree. On 
the first level, x\ is used, on the second level x 2 , on the third again X\ and so on. 
The result for events e* is shown in figure |I]b. The amount of time needed to fill 
the tree is ~ ^]I 1 log 2 (i) = log 2 (iV!) = 0(N log 2 (iV)). The last equality can be 
easily verified with the help of Sterling’s formula. 

Finding all events within the tree which lie in a given box is done in a similar 
way by comparing the bounds of the box with the coordinates of the events in 
the tree. For example, if the whole box lies to the right of event e\ as shown 
in figure [Ljc, then only events on the branch below and including e 2 need to be 
searched. This halves the number of events in question. Only if an event in a node 
lies within the bounds of the coordinates of the box that it is compared to, both 
its siblings need to be searched. Searching the tree once requires therefore an 
effort only ~ log 2 (iV). It needs to be noticed that the whole tree of Monte Carlo 
generated events needs to be kept in the main memory of the computer to have 
a reasonably fast access time when comparing the coordinates. Therefore, only 
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the advent of computers with random access memory of the order of hundreds 
of megabyte made it possible to use millions of events to sample the signal and 
background densities. 

Since the number of signal and background events used in the calculation of 
the probability density estimation is known at each point in the phase space, the 
uncertainty of this estimate due to the limited number of signal and background 
events can be calculated. By inspecting eq. (§) we find that the statistical error 
A-D(x) is given by 


A D(x) 


cn b 


.( n s + cn b y 



+ 


cn , 


{n s + cn b y 



2 




( 5 ) 


where A n s and A n b are the statistical uncertainties of the signal and background 
respectively. 

Usually the calculated discriminant values of samples of events are histogramed 
and the performance of the discrimination technique is estimated by applying cuts 
on this discriminant. For discriminant values falling into the same bin, the uncer¬ 
tainties are of course correlated, because they are derived from the same samples 
of signal and background events. Using the individual uncertainties on D for 
each event according to eq. (0) would overestimate the systematic uncertainty^. 
Instead, a good estimate of the systematic uncertainty of each histogram bin is 
given by using the total number of signal and background in eq. (|5|), which fall 
into the phase space region corresponding to the bin. We have used Monte Carlo 
experiments using statistically independent event samples to confirm the validity 
of this estimate. 

In order to influence the uncertainty of the probability density estimate, the 
size of the box can be adapted. The lengths of the box edges are the only free 
parameters of the algorithm. In the following examples where the observables 
are produced in similar ranges (i.e. they have similar “scales”), the number of 
parameters is reduced by using a hyper cube with edges of equal size 21. In this 
case l is the largest distance in the maximum norm of every counted event to the 
centre of the box. In practical applications, where the observables can have very 
different scales, the relative length of the edges can be deduced from histograms 
of the observables before the method is applied. The box size should be chosen 
large enough in order to have a reasonably small uncertainty on D(x). For too 
large boxes, however, the precise mapping of the probability density onto boxes is 
not possible and therefore the achievable separation is reduced. As will be shown 
in the following, the performance of the PDE-RS method is not very sensitive 
with respect to the box size. 

3 We call the uncertainty of the distribution of D induced by the statistical uncertainties of 
the event samples used for the classification the systematic uncertainty of D. 
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4 Properties of PDE-RS and Comparison to Neural Networks 

In the following we will study the properties and the performance of the PDE- 
RS method using three simple examples chosen as prototypes of basic situations 
encountered in High Energy Physics data analyses. The performance, the needed 
computing time and the ease of applicability are compared to NNs. The following 
examples have been chosen: 

1. Two Gaussian probability densities to study the simple case of uncorrelated 
observables, where most of the multi-variate discrimination techniques give 
good results. 

2. Two strongly correlated observables, where an elementary variable transfor¬ 
mation largely simplifies the classification problem, as example of a prob¬ 
lem which can be easily solved, if the relations between the observables 
are known e.g. by insights into the acting physical laws. Such problems 
can easily be solved by a physicist when a small number of observables are 
involved, but can pose problems to classification algorithms. 

3. A high dimensional example, where a physicist has difficulties to find a 
good separation by looking at the observable distributions and where the 
nmlti-variate discrimination techniques generally show their strength. 

4.1 Bivariate Uncorrelated Gaussian Probability Densities 

A very simple example using two two-dimensional Gaussians for the background 
with means (aq^) = 3 and (aq,&) = 4.5 and widths of aq^ = aqy = 1 and for 
the signal with means (aq )S ) = 4, (x 2 ,&) = 3.5 and widths of ai jS = <J 2 )S = 0.75 is 
shown in figure W- In figure 0a the resulting probability density that an event 
is of signal type is depicted. To calculate the probability density 100,000 events 
have been filled into the two binary trees each and AD < 0.05 has been required. 
A box of V(xi,x 2 ) = 0.18 • 0.18 around each classified event in which sample 
events are counted has been used. The distributions of the discriminant D of 
test signal and background event^] is shown in figure 0 . Most of the background 
events are correctly classified and have a small D value. Since there is no phase 
space region where there are signal but no background events, the discriminant 
does not peak at D ~ 1 but at a somewhat lower value. The shaded area depicts 
the systematic uncertainty of the discriminant according to eq. (0). Finally, the 
background rejection 1 — e&, where e& is the background efficiency, is shown as a 
function of the signal efficiency e s in figure 01 and is compared to the result of 
a single hidden layer feed forward NNQ with 10 hidden nodes. In an ideal case 

4 In order not to bias the performance, the event samples are divided into one needed sample 
used to set-up the binary tree and one sample to test the performance of the classification 
algorithm. 

5 We used a modified version of the package written in C++ by J. P. Ernenwein, available 
at http://e.home.cern.ch/e/ernen/www/NN/. 
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Fig. 2: a) The phase space densities of background (open circles) and signal 
(full circles) of two simple bivariate Gaussian distributions with displaced 
means and equal widths b) the probability density for signal type events 
estimated by the PDE-RS method c) the resulting shape of the discrim¬ 
inant distribution for signal and background events. The band indicates 
the statistical uncertainty of the discriminant, d) background rejection, 
1 — gfe, versus signal efficiency for the PDE-RS method and a NN, obtained 
by cutting on the discriminant distributions. 
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where every event is classified correctly, the area below the background rejection 
- signal efficiency curve would be a square of area 1. For the PDE-RS method 
we find an area of 0.876 ± 0.01, for the NN an area of 0.877. The two methods 
are compatible. While the PDE-RS method performs slightly better for lower 
signal efficiencies, its performance for e s ~ 1 is slightly below the one of the NN. 
The signal efficiency e s actually never reaches 1, since there are always events 
which cannot be classified due to the requirement that D(x) has to be calculated 
with sufficient statistical precision, i.e. AD < 0.05. The NN on the other hand 
classifies every event regardless of the uncertainty of the fit to the probability 
density. 

4.2 Highly Correlated Observables 

In a second example we study the performance of the PDE-RS for strongly cor¬ 
related input observables. Here, events are generated on a ring and smeared by 
a Gaussian. The rings of signal and background have the same diameter R = 3 
and the width of the Gaussian used to smear the signal is a s = 1/2, while for the 
background this is cp, = 1/4. Such an example was also used in |ll|, where it was 
found that the high correlation of the Cartesian variables X\ and £2 of the events 
makes classification very difficult for NN’sQ The resulting signal and background 
event distributions are shown in figure ^ja along with the resulting signal prob¬ 
ability in figure |3]b. The shape of the discriminant distributions for signal and 
background is depicted in figure |3jc. A slightly smaller volume V = 0.12 ■ 0.12 
compared to the previous example was used, again with 100,000 events to sample 
the signal and background distributions, each. I 11 figure |3jd the performance of 
the PDE-RS method is compared to a NN which has the same architecture as 
in the previous example. This time the integrated area for the PDE-RS method 
is slightly larger (0.708 ± 0.031) than for the Neural Network (0.691). However, 
the results of the two methods are compatible. The NN was again trained with 
100,000 events using 10 hidden nodes. Since after a transformation to polar co¬ 
ordinates the given example reduces to a one-dimensional problem which can 
be solved analytically, also the theoretically optimal efficiency curve is shown. 
Its integrated area is 0.705. The lower part of figure |3|cl shows the difference of 
the curves for the PDE-RS (NN) and the optimum. The fact that the PDE-RS 
method performs slightly better than the theoretical optimum can be explained 
by the statistical uncertainties. 

While the performance of the PDE-RS method and the NN is similar, the time 
to compute the result is not. For the calculation using the PDE-RS method 224 
seconds were needed on an 800 Mhz Linux PC with a RAM of 256 Mbyte. The 
same task took 34.6 hours for the training of every single NN, and several nets 
had to be tried before the right combination of training parameters was found! 

6 We find that the performance of the NN is good, if a very large number of training cycles 
is used, i.e. if a very large computation time is spent. 
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c) 





Fig. 3: a) The phase space density of background (open circles) and signal (full 
circles) events generated according to Gaussian smeared rings which are 
highly correlated, b) the signal probability density in the Xi-avpiane. c) 
the resulting shape of the discriminant distribution for signal and back¬ 
ground events, d) background rejection, 1 — eb, versus signal efficiency e s 
for the PDE-RS method and a NN, obtained by cutting on the discrim¬ 
inant distributions. In the lower part of the figure the difference of the 
PDE-RS method (NN) to the optimum is shown. 
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Fig. 4: a) Projections of the phase space density for the five dimensional example 
to the observables xi and x 2 . The background (signal) events are shown 
as open (full) circles, b) the resulting shape of the discriminant for signal 
and background events, c) background rejection, 1 — e&, versus signal 
efficiency for the PDE-RS method and a NN, obtained by cutting on the 
discriminant distributions. 


4.3 High Dimensional Example 

In this example the behaviour for a large number of observables, i.e. a problem 
with high dimensionality is studied. We use a set of five moderately correlated 
observables^] which describe an event. In figure |4|a the two-dimensional projections 
on the observables X\ and x 2 of the five-dimensional distribution of signal and 
background events are shown. 500,000 events were used to populate the phase 
space and filled into the two binary trees. The PDE-RS method can separate 
signal from background events using a hypercube with size l = 1.2. This can be 
seen in figure where the shape of the discriminant distribution for signal and 
background events is shown. The performance of the NN and PDE-RS methods 
are compatible. The area under the background-rejection versus efficiency curve 
(see figure f|c) is 0.906 ±0.008 for the PDE-RS and 0.910 for the NNs. To get this 
performance the NN had to be trained with 500,000 events for 1000 training cycles 
and 10 hidden nodes were used. If 40 hidden nodes are used and thus 4 times more 
weights are available, the area under the background-rejection versus efficiency 
curve increases to 0.913 but at the same time the computing time increased. 

' The example is constructed as follows: For every signal event a vector x( = 
(G(4,1), G(l, 1), G(2,1.5), G(2,1), G(1.5, 2)) with components sampling normal distributions 
G((x ), er(x)) with mean (x) and width a(x) is constructed. This vector is then transformed 
according to 


f Xi \ 
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For background events, the initial vector is 
x' = (G(4,1), G(2,1), G( 3,1.5), G(l, 1), G(0.5,1)). 
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Fig. 5: a) The dependence of the separation power, S = e s /eb, for fixed signal 
efficiency on the box size and the number of events stored in the binary 
trees for the five-dimensional example, b) computing time needed depend¬ 
ing on the box size of the PDE-RS method and for the NN. The arrow 
shows how a large numbers of events allow to use smaller box-sizes which 
reduces the computing time needed. 

Figure ||a shows the separation power S := e s /eb at a signal efficiency of 
e s = 70% as a function of the box size l for the PDE-RS method. The separation 
power has a broad plateau and varies only within 20%, when the box size is 
changed over a large range. This behaviour makes the separation power nearly 
independent of the box-size and will allow to use the algorithm with a minimum 
of human intervention^. 

The drop of the separation power towards larger box-sizes is due to the less 
accurate mapping of the phase space density to boxes around the event to be 
classified. On the other hand, too small boxes will diminish the number of events 
in the box and thus will also make the resolution of the discriminant smaller, 
because neighbouring events might end up in different places of the discrimi¬ 
nant distribution due to statistical fluctuations. This can lead to a smearing of 
neighbouring events across a larger part of the discriminant. 

Figure 5b shows the CPU time needed to compute the PDE-RS results de¬ 
pending on the box size used and in addition the time needed to train the NN 
with 10 nodes. Larger boxes strongly increase the computing time needed be¬ 
cause for all candidate events found by the binary search within the trees, a 
time-consuming check needs to be done whether the event actually falls into the 
box. The CPU time needed if more events are used for classification increases 
logarithmically, as expected. However, using larger numbers of events for classi¬ 
fication also allows to reduce the box size at the same separation as indicated by 
an arrow in figure 5a. In the region of good performance of the PDE-RS method, 
typically a 10 times smaller computing time than for the NN is needed. 

When comparing the time consumption of the range searching algorithm to 

8 The relatively small dependence of the separation power on the choice of the box size l has 
also been verified for the other examples discussed here and seems to be a general feature of 
the PDE-RS method. 
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Inclusive DIS kinematic variables: 

Q 2 = ~Q 2 

x = Q 2 /(2P ■ q) 

Instanton kinematic variables: 

Q' 2 = ~q' 2 
x' = Q' 2 / (2 g ■ q') 


Fig. 6 : Sketch of an instanton-induced process in DIS and the definition of the 
important kinematic variables for inclusive DIS process and instanton- 
induced processes. The four-vector of the exchanged photon (incoming 
proton) is denoted by q' ( P ). The four-vector of the quark (gluon) inducing 
the hard instanton-process is denoted q' (g). 


the time needed by a NN, it is interesting to note, that both algorithms have a 
very different behaviour. While for the PDE-RS algorithm the time during the 
set-up period of filling the binary trees is more or less the time for reading in 
the data from a storage medium, the time needed to train the artificial NN is 
considerable. On the other hand, the time needed to classify a single event after 
the initialisation phase takes longer with the range searching algorithm, at least 
when compared to NNs of moderate size. However, descending down the binary 
trees to collect events is a task which naturally can be done in parallel using 
multiple threads of program execution. 


A Practical Application: Instanton-induced Processes in 
DIS at HERA 


5.1 Instanton-induced Processes in DIS at HERA 


Instantons [[12] are a fundamental non-perturbative aspect of QCD, inducing 


hard processes that are absent in perturbation theory. The expected cross sec¬ 
tion in deep-inelastic electron-proton (ep) scattering as calculated in “instanton- 
perturbation-theory” [|K|] is sufficiently large to make an experimental discovery 


possible (Dj. However, the background rate is about a factor of 1000 larger — a 


challenging task for the classification algorithm. For a more detailed introduction 
to instantons-induced processes (/) see e.g. [|H| . 


We study the prospect of a search for /-induced events modelled by the Monte 


Carlo simulator QCDINS |16| which generates /-induced events in deep-inelastic 
ep-scattering. In /-induced DIS processes (see figure |j) a quark emerges from a 
^/-splitting of the exchanged photon and fuses with a gluon emitted from the 
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Fig. 7: The event observables characterising the hadronic final state of /-induced 
processes in DIS at HERA providing good instanton separation along 
with small systematic uncertainties. Shown are simulations of /-processes 
(QCDINS) and of standard DIS background. The band indicates the un¬ 
certainty due to different QCD models. The observables are explained in 
text. 


proton. In the /-induced process gg-pairs of each of the three light quark flavours 
and on average 2-3 gluons are produced. In the hadronic centre-of-mass system 
(hCMS) they form a band (of about two units in pseudo-rapidity) of particles with 
high transverse energy which are homogeneously distributed in azimuth. Since 
in every event a pair of strange quarks is produced, in this band an increased 
number of kaons compared to standard DIS events is expected. Finally, the 
quark out of the split photon not participating in the /-subprocess forms a hard 
jet. /-induced events can be distinguished from standard DIS background events 
by their characteristic hadronic final state [[H], [1T|, [T7| . It is therefore necessary to 


End hadronic final state observables, which are well modelled by the background 
Monte Carlo simulations and which provide the best possible reduction of the 
background. 


5.2 Instanton Classification Results 

Starting with 35 observables based on the hadronic final state the best 12 were 
chosen by calculating the discriminant with all 2-combinations (pairs) of the 
initial observables and taking those observables which provide a high separation 
power S = e s /€b demanding an efficiency for instantons of e s = 10%. The number 
of considered observables is further reduced by calculating all 5-combinations and 
selecting those with highest separation power and a small systematic variation of 
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Fig. 8: a) Number of events expected for an integrated luminosity of 100 pb -1 as a 
function of the discriminant for instanton-induced and standard DIS pro¬ 
cesses. Only the signal region with D > 0.9 is shown, b) Separation power 
S at e s = 10% for different relative box widths and different numbers of 
events in the binary trees. 


the background. The systematic uncertainty was obtained by using four standard 
DIS-MC simulators [|l^] which were tuned to data on representative hadronic fi¬ 
nal state quantities, in the range Q 2 > 100 GeV 2 at HERA |HJ. The observables 


forming the best combination are shown in figure [7| These are: the reconstructed 
virtuality of the quark entering the /-subprocess Q ' 2 ec , the sphericity of the par¬ 
ticles in the /-Band[] in their rest system Sph B , the second Fox-Wolfram moment 
2TJ FW 2 of these particles and the event shape observables F out)B which is the 


projection of the particle transverse momentum onto the axis that makes this 
quantity maximal |U] and finally the number of charged kaons in the /-Band (see 
22 fl for a detailed description of the observables). 

The separation power for e s = 10% is S — 126. In figure |]a the expected num¬ 
ber of /-induced events in DIS at HERA and the number of background events 
is shown as a function of the discriminant D, for a data sample of an integrated 
luminosity of 100 pb - . Only the signal region defined by D > 0.9 is shown. 
The luminosity is comparable to the one already collected by each of the HERA 
experiments HI and ZEUS. An event sample can be isolated where half of the 
events are instantons while the /-efficiency is still 10%. The decreasing back¬ 
ground model uncertainty in the signal region reflects the choice of observables 
with a minimum background uncertainty, which was possible due to the speed 
and flexibility of the PDE-RS method. 

To reduce the number of parameters for the box size, the ratios of the box 
edge lengths were fixed by defining a box which contains most of the events and 
letting V be a scaled version of this large box. The projections onto these box 
edges are shown in figure [|. The variation of the result depending on the size of V 
is shown in figure |8]b. The behaviour is similar to the one in the toy-model: The 
separation increases for smaller boxes with the number of events that populate 


9 The instanton band is defined to have a width of 2.2 units of rapidity around the Ex- 
weighted mean rapidity of all particles except the jet of the event, taken in the hCMS. 
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the search trees, while for larger boxes this difference vanishes. The width of the 
plateau increases with the number of events in the classification tree. If the data 
sample is large enough, the width of the plateau spans one order of magnitude. 
This allows to use only an approximate size parameter which reduces the need 
for fine tuning, if a large enough Monte Carlo data sample is available. 

In addition a comparison with a single hidden layer feed forward NN was 
done. Several network architectures were tried. The network performing best 
had 3 layers with 100 hidden nodes. It reached a separation of S' = 116 at 
an /-efficiency of 10%, being slightly worse than the PDE-RS method. To get 
this good performance the high numbers of nodes was mandatory. Training the 
net was rather time consuming^ and a lot of human intervention was needed to 
adjust the training parameters. 

6 Conclusions 

For the examples covering different basic problems appearing in High Energy 
Physics data analyses the presented new classification algorithm based on range 
searching (PDE-RS) has a discrimination power which is comparable to the one 
provided by Neural Networks. However, the PDE-RS method needs less com¬ 
putation time, is more transparent and is rather insensitive to the choice of the 
free parameters that need to be set manually. Moreover the classification error 
can be easily evaluated. For complex cases the speed of the algorithm allows to 
carefully choose the best combination of observables which give the best separa¬ 
tion and for which the observables and their correlations are well described by 
the simulations. The PDE-RS method is therefore a powerful tool to find a small 
number of signal events in the large data samples expected at future colliders. It 
is particularly suited for hadron colliders where the background is large and the 
correct background description is difficult. 
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